{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "http://cs229.stanford.edu/ps/ps1/ps1.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Poisson distribution:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "p(y; \\lambda)\n",
    "&= \\frac{e^{-\\lambda} \\lambda ^y}{y!} \\\\\n",
    "&= \\frac{1}{y!}\\exp \\big(y \\ln{\\lambda} - \\lambda\\big) \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So in the form exponential family $p(y;\\eta) = b(y) \\exp\\big(\\eta T(y) - a(\\eta)\\big) $, it becomes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "b(y) &= \\frac{1}{y!} \\\\\n",
    "\\eta &= \\ln \\lambda \\\\\n",
    "T(y) &= y \\\\\n",
    "a(\\eta) &= \\lambda = e^{\\eta} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Canoical response function, i.e. the function ($g$) that gives the distribution’s mean as a function of the natural parameter ($\\eta$) based on http://cs229.stanford.edu/notes/cs229-notes1.pdf:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "g(\\eta)\n",
    "&= \\mathbb{E}[T(y); \\eta] \\\\\n",
    "&= \\mathbb{E}[y; \\eta] \\\\\n",
    "&= \\lambda \\\\\n",
    "&= e^{\\eta} \\\\\n",
    "&= e^{\\theta^T x} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note $\\eta = \\theta^Tx$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Log-likelihood:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "l(\\theta) &= \\log p(y; \\lambda) \\\\\n",
    "          &= \\log \\frac{1}{y!} + y \\ln \\lambda - \\lambda \\\\\n",
    "          &= \\log \\frac{1}{y!} + y \\eta - e^{\\eta} \\\\\n",
    "          &= \\log \\frac{1}{y!} + y \\theta^T x - e^{\\theta^T x}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Its derivative:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial l(\\theta)}{\\partial \\theta_j}\n",
    "&= y x_j - e^{\\theta^T x} x_j \\\\\n",
    "&= \\big(y - e^{\\theta^T x}\\big) x_j \\\\\n",
    "&= \\big(y - e^{\\eta}\\big) x_j \\\\\n",
    "&= (y - \\lambda) x_j\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the stochastic gradient ascent rule:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\theta_j &:= \\theta_j + \\alpha \\big(y^{(i)} - e^{\\theta^T x^{(i)}}\\big) x_j^{(i)}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "GLM:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$p(y;\\eta) = b(y) \\exp\\big(\\eta T(y) - a(\\eta)\\big) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given $T(y) = y$, so the GLM becomes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$p(y;\\eta) = b(y) \\exp\\big(\\eta y - a(\\eta)\\big) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and the log-likelihood is "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\log p(y;\\eta) = \\log b(y) + \\eta y - a(\\eta)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The derivative:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\frac{\\partial \\log p(y;\\eta)}{\\partial \\theta_j} \n",
    "&= 0 + y \\frac{\\partial \\eta}{\\theta_j} - \\frac{\\partial a(\\eta)}{\\partial \\eta} \\frac{\\partial \\eta}{\\theta_j} \\\\\n",
    "&= y x_j  - \\frac{\\partial a(\\eta)}{\\partial \\eta} x_j \\\\\n",
    "&= \\Big(y - \\frac{\\partial a(\\eta)}{\\partial \\eta}\\Big) x_j\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we need to work out what $\\frac{\\partial a(\\eta)}{\\partial \\eta}$ is."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the following (ref: http://people.stat.sfu.ca/~raltman/stat402/402L6.pdf):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\int_{y}p(y;\\eta) dy = 1 $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Taking derivative on both side w.s.t $\\eta$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\frac{d}{d \\eta}\\int_{y}p(y;\\eta) dy = 0 $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\int_{y} \\frac{d}{d \\eta} p(y;\\eta) dy = 0 $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Transforming the left side,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\int_{y} \\frac{d}{d \\eta} p(y;\\eta) dy\n",
    "&= \\int_{y} \\frac{d}{d \\eta} \\Big(b(y) \\exp\\big(\\eta y - a(\\eta)\\big)\\Big) dy \\\\\n",
    "&= \\int_{y} b(y) \\exp \\big(\\eta y - a(\\eta) \\big)\\Big(y - \\frac{d a(\\eta)}{d \\eta}\\Big) dy \\\\\n",
    "&= \\int_{y} p(y; \\eta) \\Big(y - \\frac{d a(\\eta)}{d \\eta}\\Big) dy \\\\\n",
    "&= \\int_{y} p(y; \\eta) y - \\int_{y} p(y;\\eta) \\frac{d a(\\eta)}{d \\eta} dy \\\\\n",
    "&= h(x) - \\frac{d a(\\eta)}{d \\eta} \\int_{y}p dy \\\\\n",
    "&= h(x) - \\frac{d a(\\eta)}{d \\eta} 1 \\\\\n",
    "&= h(x) - \\frac{d a(\\eta)}{d \\eta} \\\\\n",
    "&= 0 \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\frac{\\partial a(\\eta)}{\\partial \\eta} = h(x)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Taking it back to the derivative,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\frac{\\partial \\log p(y;\\eta)}{\\partial \\theta_j}  = \\big(y - h(x)\\big) x_j$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the stochastic gradient ascent rule:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\theta_j \n",
    "&:= \\theta_j + \\alpha \\big(y^{(i)} - h(x^{(i)} \\big)x_j^{(i)} \\\\\n",
    "&:= \\theta_j - \\alpha \\big(h(x^{(i)} - y^{(i)} \\big)x_j^{(i)}\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's very interesting to know that all GLM where $T(y) = y$ have the same update rule for stochastic gradient ascent."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
